通常,深度神经网络(DNN)修剪方法分为两类:1)基于重量的确定性约束和2)概率框架。虽然每种方法都有其优点和局限性,但有一系列常见的实际问题,例如试验和错误,以分析灵敏度和超参数来修剪DNN,这困扰着它们。在这项工作中,我们提出了一种新的单次,使用自适应连接分数(SNACS)称为减少神经网络的新型自动化算法。我们所提出的方法将概率修剪框架与基础重量矩阵的约束相结合,通过新的连接测量,在多个层面下,以利用两种方法的强度,同时解决它们的缺陷。在\ alg {}中,我们提出了一种基于自适应条件互信息(ACMI)的快速哈希估计器,它使用基于权重的缩放标准来评估过滤器和Preune不重要的连接之间的连接。为了自动确定可以修剪层的限制,我们提出了一组操作约束,该操作约束共同定义了深网络中所有层的上部修剪百分比限制。最后,我们为尺寸为测量其对后面的层的强度的滤波器来定义一个新颖的敏感性标准,并突出需要完全保护的严重过滤器。通过我们的实验验证,我们表明SNACS超过17倍最接近的可比方法,并且是跨三个标准数据集-DNN修剪基准测试的最新的单次修剪方法:CIFAR10-VGG16,CIFAR10-RESET56和ILSVRC2012-RENET50。
translated by 谷歌翻译
We consider the contextual bandit problem on general action and context spaces, where the learner's rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available, e.g., patients' records or customers' history, which allows for personalized treatment. We focus on consistency -- vanishing regret compared to the optimal policy -- and show that for large classes of non-i.i.d. contexts, consistency can be achieved regardless of the time-invariant reward mechanism, a property known as universal consistency. Precisely, we first give necessary and sufficient conditions on the context-generating process for universal consistency to be possible. Second, we show that there always exists an algorithm that guarantees universal consistency whenever this is achievable, called an optimistically universal learning rule. Interestingly, for finite action spaces, learnable processes for universal learning are exactly the same as in the full-feedback setting of supervised learning, previously studied in the literature. In other words, learning can be performed with partial feedback without any generalization cost. The algorithms balance a trade-off between generalization (similar to structural risk minimization) and personalization (tailoring actions to specific contexts). Lastly, we consider the case of added continuity assumptions on rewards and show that these lead to universal consistency for significantly larger classes of data-generating processes.
translated by 谷歌翻译
Athletes routinely undergo fitness evaluations to evaluate their training progress. Typically, these evaluations require a trained professional who utilizes specialized equipment like force plates. For the assessment, athletes perform drop and squat jumps, and key variables are measured, e.g. velocity, flight time, and time to stabilization, to name a few. However, amateur athletes may not have access to professionals or equipment that can provide these assessments. Here, we investigate the feasibility of estimating key variables using video recordings. We focus on jump velocity as a starting point because it is highly correlated with other key variables and is important for determining posture and lower-limb capacity. We find that velocity can be estimated with a high degree of precision across a range of athletes, with an average R-value of 0.71 (SD = 0.06).
translated by 谷歌翻译
Previous work has shown that a neural network with the rectified linear unit (ReLU) activation function leads to a convex polyhedral decomposition of the input space. These decompositions can be represented by a dual graph with vertices corresponding to polyhedra and edges corresponding to polyhedra sharing a facet, which is a subgraph of a Hamming graph. This paper illustrates how one can utilize the dual graph to detect and analyze adversarial attacks in the context of digital images. When an image passes through a network containing ReLU nodes, the firing or non-firing at a node can be encoded as a bit ($1$ for ReLU activation, $0$ for ReLU non-activation). The sequence of all bit activations identifies the image with a bit vector, which identifies it with a polyhedron in the decomposition and, in turn, identifies it with a vertex in the dual graph. We identify ReLU bits that are discriminators between non-adversarial and adversarial images and examine how well collections of these discriminators can ensemble vote to build an adversarial image detector. Specifically, we examine the similarities and differences of ReLU bit vectors for adversarial images, and their non-adversarial counterparts, using a pre-trained ResNet-50 architecture. While this paper focuses on adversarial digital images, ResNet-50 architecture, and the ReLU activation function, our methods extend to other network architectures, activation functions, and types of datasets.
translated by 谷歌翻译
图形卷积神经网络(GCNN)是材料科学中流行的深度学习模型(DL)模型,可从分子结构的图表中预测材料特性。训练针对分子设计的准确而全面的GCNN替代物需要大规模的图形数据集,并且通常是一个耗时的过程。 GPU和分布计算的最新进展为有效降低GCNN培训的计算成本开辟了道路。但是,高性能计算(HPC)资源进行培训的有效利用需要同时优化大型数据管理和可扩展的随机批处理优化技术。在这项工作中,我们专注于在HPC系统上构建GCNN模型,以预测数百万分子的材料特性。我们使用Hydragnn,我们的内部库进行大规模GCNN培训,利用Pytorch中的分布数据并行性。我们使用Adios(高性能数据管理框架)来有效存储和读取大分子图数据。我们在两个开源大规模图数据集上进行并行训练,以构建一个称为Homo-Lumo Gap的重要量子属性的GCNN预测指标。我们衡量在两个DOE超级计算机上的方法的可伸缩性,准确性和收敛性:橡树岭领导力计算设施(OLCF)的峰会超级计算机和国家能源研究科学计算中心(NERSC)的Perlmutter系统。我们通过HydragnN表示我们的实验结果,显示I)与常规方法相比,将数据加载时间降低了4.2倍,而II)线性缩放性能在峰会和Perlmutter上均可训练高达1,024 GPU。
translated by 谷歌翻译
我们研究了非I.I.D的普遍一致性。在线学习的过程中的过程。如果存在一项学习者,则据说随机过程允许普遍一致性,以便在该过程中实现任何可测量的响应函数的消失平均损失。当损失函数无界时,Blanchard等人。表明,唯一承认强烈普遍一致性的进程几乎肯定地占用了有限数量的价值。然而,当损失函数有界时,承认强大的普遍一致性的流程类别更加丰富,其特征可以取决于响应设置(Hanneke)​​。在本文中,我们表明,这类过程与响应设置无关,从而关闭打开的问题(Hanneke,打开问题3)。具体来说,我们表明,允许普遍在线学习的过程对于二进制分类是与具有可数类数量的多字符分类的相同。因此,可以减少具有界限损耗的任何输出设置到二进制分类。我们的减少是建设性和实用性的。实际上,我们表明最近的邻居算法由我们的建筑运输。对于承认强大的普遍学习的过程的二进制分类,我们证明了最近的邻居至少至少学习所有的一系列间隔。
translated by 谷歌翻译
我们研究了紧凑型歧管M上的回归问题。为了利用数据的基本几何形状和拓扑结构,回归任务是基于歧管的前几个特征函数执行的,该特征是歧管的laplace-beltrami操作员,通过拓扑处罚进行正规化。提出的惩罚基于本征函数或估计功能的子级集的拓扑。显示总体方法可在合成和真实数据集上对各种应用产生有希望的和竞争性能。我们还根据回归函数估计,其预测误差及其平滑度(从拓扑意义上)提供理论保证。综上所述,这些结果支持我们方法在目标函数“拓扑平滑”的情况下的相关性。
translated by 谷歌翻译
自主驾驶包括多个交互模块,其中每个模块必须与其他模块相反。通常,运动预测模块取决于稳健的跟踪系统以捕获每个代理的过去的移动。在这项工作中,我们系统地探讨了运动预测任务的跟踪模块的重要性,并且最终得出结论,整体运动预测性能对跟踪模块的缺陷非常敏感。我们明确比较了使用跟踪信息的模型,该模型不会跨越多种方案和条件。我们发现跟踪信息发挥着重要作用,并在无噪声条件下提高运动预测性能。然而,在跟踪噪声的情况下,如果没有彻底研究,它可能会影响整体性能。因此,我们应该在开发和测试运动/跟踪模块时注意到噪音,或者他们应该考虑跟踪自由替代品。
translated by 谷歌翻译